Euclidean Bus Mobility and Route Optimization, A Comparison

Routes in New York City, NY

Author

Alan Vlakancic

Published

December 7, 2025

Introduction:

This project uses stplanr and OSRM transport modeling package to compare optimal and euclidean transport paths for bus routes in New York City. Stplanr is a transport planning visualization R package that can be used to plan transit networks, in addition to transit planning elements such as identifying transit catchment areas, buffers, origin/destination data and ride frequency visualizations, among others. In their white paper, the devlopers of stplanr call for an accountable, transparent and democratized transit planning system that doesn’t rely on proprietary and often vastly different data sources and data processing softwares. As a result, they created stplanr so that regular data analysts can analyze data rather than relying on a more mysterious algorithm.

Although the package can visualize a whole host of data, this project will focus on comparing direct desire lines or “Euclidean” routes (as the crow flies), existing bus networks and stplanr’s optimized routes. Essentially: this can map the efficiency of the bus routes compared to the most direct route possible if there were no built environment factors in the way.

Methods:

To adequately compare desire lines, and bus routes using the most efficient routes with the current street network, this project will require the below data sources. Each of these will be sourced and overlaid onto each other:

  • Data Source: stplanr package for basemap
    • Methods: stplanr will be used for its integration into OSRM data. In short, this will be used to create desire lines from origin-desitnation pairs identified using the data from New York City Open Data below. Specifically, “od2line” and “route” functions will be used. Some limitations of this package are that some of it’s functionality has migrated over to sf package, as they do fundamentally the same thing.
    • Source: stplanr CRAN
  • Data Source: leaflet package for basemap
    • Methods: This can be sourced directly into R by installing the leaflet package. It is an interactive map so the user can navigate it.
    • Source: Leaflet for R
  • Data Source: OpenStreetMap (OSMR) data for transit data, either bus or cycle routes, this is used as the route vector data. This can also calculate geometry type, and time.
  • Data Source: NYC Open Data for Bus Shelter locations
    Despite significant searching, there is no comprehensive bus stop dataset, so the project will focus on bus stop shelters, which are mapped via NYC Open Data. I used Bus Shelters as there would be thousands upon thousands of bus stops in NYC, and this would be too computationally intensive to process.
    • Methods: This is brought into R as a CSV file. Each bus stop shelter has longitude and latitude coordinates that align with leaflet and OpenStreetMap projections.
    • Source: NYC Bus Stop Shelters, NYC Transit analysis
  • Data Source: NYC Open Data for NYC Borough Boundaries

The overlay of these elements using leaflet will provide an interactive map that compares desire lines, existing bus routes, and optimized routes. The stplanr package will be used to calculate the optimized routes based on the bus shelter locations and the OpenStreetMap data. The comparison will be visualized using different colors and line styles for each type of route. This will be randomized with a 50 origin-destination pairs to make the computation manageable.

Data Preparation:

  • Load the necessary R packages for spatial data manipulation and visualization (e.g., ggmap, dplyr, stplanr, osmdata, sf, leaflet).
  • Import the NYC basemap shapefile and bus shelter CSV data into R.
  • Convert the bus shelter data into an sf object with appropriate coordinate reference system (CRS).
  • Create a bounding box for NYC to limit the data extraction from OpenStreetMap.
  • Use the osmdata package to extract primary and secondary road data from OpenStreetMap within the NYC bounding box.
  • Prepare the bus shelter data by adding unique IDs for each shelter to facilitate origin-destination pairing.
  • Generate random origin-destination pairs from the bus shelter data, ensuring that no shelter is paired with itself.
  • Create desire lines (Euclidean routes) between the origin-destination pairs using the od2line function from stplanr.
  • Compare the desire lines with optimized routes calculated using the route_osrm function from stplanr.
  • Display this data in a table & graph

Terms:

  • Desire Lines & “Euclidean”: Straight lines connecting origin and destination points, representing the most direct path between them.
  • OSRM: Open Street Routing Machine, a routing engine that uses Open Street Map data to calculate routes, shortest routes, travel times, and can be used to make travel time maps, distance routing for car, bike and walking.
  • O: Origin Data, that is, where the trip starts.
  • D: Destination Data, that is, where the trip ends.
  • R: Route.
  • od2line: A function in stplanr that converts origin-destination data into spatial lines (desire lines).
  • route_osrm: Another function in stplanr that calculates the optimal route between two points using OSRM.

Interactive Map:

Show code
#install libraries

library(ggplot2) #for plotting graphs
library(dplyr) #data manipulation
library(stplanr) #od2line calculation, origin-destination programming
library(osmdata) #open street maps data
library(sf) #spatial data
library(tidyverse) #data manipulation
library(leaflet) #interactive map
library(purrr) #for map2 function
library(kableExtra) #for attractive table formatting
library(scales) #for comma function
library(plotly) #for interactive plots
Show code
#SHAPEFILES AND MAP
#This first step creates the basemap and brings in the data for processing in later chunks

bbox <- c(left = -73.96, bottom = 40.54, right = -73.70, top = 40.81)
#create bounding box for NYC

nyc_map <- "data/"#

#nyc basemap, downloaded from nyc open data. source: https://search.r-project.org/CRAN/refmans/ptools/html/nyc_bor.html

nyc_sf <- st_read(nyc_map, quiet =TRUE)
#bring in nyc_map as a sf

shelters_sf <- read_csv("data/Bus_Stop_Shelter_20251020.csv")
#this brings in the bus stop shelter information. source: https://data.cityofnewyork.us/Transportation/Bus-Stop-Shelters/qafz-7myz

shelters_sf_fix <- st_as_sf(shelters_sf, coords = c("Longitude","Latitude"), crs = 4326)
#convert to sf object with the correct coordinate reference system

osmdata::set_overpass_url("https://overpass-api.de/api/interpreter")
#set overpass url for open street maps, finds specific data package, the below query will use this

osm_data <- opq(bbox = bbox) %>%
  add_osm_feature(key = "highway", value = c("primary","secondary")) %>%
  osmdata_sf()
#import data for primary and secondary highways from open street maps
#opq is overpass query, which is basically where you want to look
Show code
# STPLANR FUNCTIONS
#this step combines both stplanr and sf functions to create desire lines and optimized routes between O-D pairs

shelters_sf_fix <- shelters_sf_fix %>%
  mutate(id = paste0("S", row_number()))
#add ID column for origin-destination pairs so they have a corresponding number

flow_all <- expand.grid(o = shelters_sf_fix$id, #create origin
                        d = shelters_sf_fix$id, #create destination
                        stringsAsFactors = FALSE) %>% #make sure they aren't factors
  filter(o != d) %>% # this remove self-pairs so O is not D
  mutate(trips = 1) %>% #add trip count of 1 for each pair
  sample_n(50) #sample 50 random paris to avoid blowing up the computer

desire_lines_all <- od2line(flow_all, zones = shelters_sf_fix, zone_code = "id") #use od2line function to create desire lines (euclidean) for all pairs

shelter_coords <- shelters_sf_fix %>%
  st_coordinates() %>%
  as.data.frame() %>%
  bind_cols(id = shelters_sf_fix$id)
#extract coordinates and bind with ID column

route_single <- function(o_id, d_id) { #function to create a single route between origin and destination
  o <- shelter_coords %>% filter(id == o_id) 
  #filter to get origin coordinates
  d <- shelter_coords %>% filter(id == d_id)
  #filter to get destination coordinates

  r <- try(route_osrm(from = c(o$X, o$Y),
                      to   = c(d$X, d$Y)), silent = TRUE)
#use try to catch errors  (e.g., no route found)
  if (inherits(r, "try-error")) return(NULL)
#if route found, return the route
  return(r)
}

routes_list <- purrr::map2(flow_all$o, flow_all$d, route_single)
#create routes for all origin-destination pairs using the route_single function by routing in paralell
routes_list <- routes_list[!sapply(routes_list, is.null)]
#remove any NULL results (failed routes)
routes_sf <- do.call(rbind, routes_list)
#combine all routes into a single sf object
Show code
#ROUTE & DESIRE LENGTH CALCULATION, MEAN & PCT CHANGE
#these calculations are for printing as a cleaned up presentable table through kable, and comparing means and providing data to be displayed in a popup on the leaflet map

routes_projection <- st_transform(routes_sf, 32618)
desire_projection <- st_transform(desire_lines_all, 32618)
#ensures the correct, projected shapefile for computation not mapping

route_length <- st_length(routes_projection)
desire_length <- st_length(desire_projection)
#compute lengths

route_length <- as.numeric(route_length)
desire_length <- as.numeric(desire_length)
#convert lengths to numeric values

lengths_tbl <- tibble(
  route_m  = route_length,
  desire_m = desire_length,
  origin = flow_all$o,
  destination = flow_all$d
)
#create tibble to compare lengths in the final map w/ IDs

lengths_tbl_print <- tibble(
  route_km = route_length / 1000,
  desire_km = desire_length / 1000,
  origin = flow_all$o,
  destination = flow_all$d
) %>%
  mutate(
    ID = row_number(),
    difference = round(route_km - desire_km)
  ) %>%
  mutate(
    route_km = comma(round(route_km)),
    desire_km = comma(round(desire_km))
  ) %>%
  arrange(desc(difference))
#tidy data for later printing in a kable

lengths_tbl_print <- lengths_tbl_print %>%
  select(ID, origin, destination, route_km, desire_km, difference)  

mean_route <- mean(route_length, na.rm = TRUE)
mean_desire <- mean(desire_length, na.rm = TRUE)
#calculate means for both route and desire lengths

percent_change <- ((mean_route - mean_desire) / mean_desire) * 100
#calculate percent change 

mean_lengths <- data.frame(
  type = c("Route Length", "Desire Line Length"),
  mean_length_m = c(round(mean_route), round(mean_desire)))

#put these into a data frame, rounded to whole numbers


mean_lengths <- mean_lengths %>%
  mutate(
    mean_length_km = mean_length_m / 1000,
    percent_change = c(percent_change, NA)
  )
#add km conversion and percent change to the data frame, i converted to KM for ease of computation (ie: dividing by 1,000)

mean_lengths_print <- mean_lengths %>%
  mutate(
    mean_length_km = comma(round(mean_length_m / 1000)),
    percent_change = comma(round(percent_change))
  )
#tidy data for later printing in a kable
Show code
#LEAFLET PREP
#this step makes sure all the coordinate systems are correct and creates popup data and color palette for the leaflet map

nyc_leaflet  <- st_transform(nyc_sf, 4326)
roads_leaflet <- st_transform(osm_data$osm_lines, 4326)
desire_leaflet <- st_transform(desire_lines_all, 4326)
routes_leaflet <- st_transform(routes_sf, 4326)
#transform all data to WGS84 for leaflet mapping

desire_leaflet_popup <- paste0(
  "<b>Desire Line</b><br/>",
  "Origin: ", desire_leaflet$o, "<br/>",
  "Destination: ", desire_leaflet$d, "<br/>",
  "Desire Line Distance: ", round(lengths_tbl$desire_m / 1000), " km"
)
#create popup info for desire lines for interactive map
  
routes_leaflet_popup <- paste0(
  "<b>  Route</b><br/>",
  "Origin: ", desire_leaflet$o, "<br/>",
  "Destination: ", desire_leaflet$d, "<br/>",
  "Route Distance: ", round(lengths_tbl$route_m / 1000), " km<br/>"
)
#create popup info for routes for interactive map

pal_desire <- colorNumeric(
  palette = "viridis",
  domain  = lengths_tbl$desire_m
)
#create color palette for desire lines based on distance

pal_routes <- colorNumeric(
  palette = "inferno",
  domain  = lengths_tbl$route_m
)
#create color palette for routes based on distance

selected_ids <- unique(c(flow_all$o, flow_all$d))
#get unique IDs of sampled shelters

selected_shelters <- shelters_sf_fix %>% 
  filter(id %in% selected_ids)
#filter shelters to only those that were sampled
Show code
#LEAFLET MAP
#this step creates the interactive leaflet map with all the data layers

leaflet() %>%
  addProviderTiles("CartoDB.Positron") %>%
  
  addPolygons(data = nyc_leaflet,
              color = "black", weight = 3,
              fillOpacity = 0.1,
              group = "NYC Boundary") %>%
  # add background w/ cartodb tiles
  
  addPolylines(
    data = desire_leaflet,
    color = pal_desire(lengths_tbl$desire_m),
    weight = 3,
    opacity = 0.7,
    popup = desire_leaflet_popup,
    group = "Desire Lines"
  ) %>%
  
  addPolylines(
    data = routes_leaflet,
    color = pal_routes(lengths_tbl$route_m),
    weight = 3,
    opacity = 0.8,
    popup = routes_leaflet_popup,
    group = "Routes"
  ) %>%
  
  addLegend(
    pal = pal_desire,
    values = lengths_tbl$desire_m,
    title = "Desire Line Distance (m)",
    position = "bottomright",
    group = "Desire Lines"
  ) %>%
  #add a legend for desire lines
  addLegend(
    pal = pal_routes,
    values = lengths_tbl$route_m,
    title = "Route Distance (m)",
    position = "bottomleft",
    group = "Routes"
  ) %>%
  #add a legend for routes

    addCircleMarkers(data = shelters_sf_fix,
                   color = "blue",
                   radius = 1,
                   popup = ~id,
                   group = "Bus Shelters") %>%
  #add all bus shelters
  
  addCircleMarkers(
    data = selected_shelters,
    color = "purple",
    radius = 6,
    fillOpacity = 0.9,
    group = "Sampled Shelters"
  ) %>%
  #add sampled bus shelters with purple markers
  addLayersControl(
    overlayGroups = c("NYC Boundary", "Sampled Shelters",
                      "Desire Lines", "Routes",
                      "Bus Shelters"),
    options = layersControlOptions(collapsed = FALSE)
  )

Tables:

Show code
# Mean Lengths Table
mean_lengths_print %>%
  kable(col.names = c("Type", "Mean Length (m)", "Mean Length (km)", "Percent Change (%)"),
        caption = "Mean Lengths of Routes vs Desire Lines") %>%
  kable_styling(full_width = FALSE, position = "left")
Mean Lengths of Routes vs Desire Lines
Type Mean Length (m) Mean Length (km) Percent Change (%)
Route Length 17161 17 20
Desire Line Length 14323 14 NA
Show code
# Lengths Comparison Table
lengths_tbl_print %>% 
  select(-ID) %>%
kable(col.names = c("Origin", "Destination", "Route Length (km)", "Desire Line Length (km)", "Difference (km)"),
        caption = "Comparison of Route Lengths and Desire Line Lengths, By Difference") %>%
  kable_styling(full_width = FALSE, position = "center")
Comparison of Route Lengths and Desire Line Lengths, By Difference
Origin Destination Route Length (km) Desire Line Length (km) Difference (km)
S1016 S2489 30 16 14
S2966 S3347 45 34 10
S991 S2380 30 21 9
S3307 S3010 45 36 9
S457 S3327 22 14 8
S2579 S449 26 20 6
S3157 S2030 27 22 5
S2894 S1256 27 22 5
S1198 S2223 19 14 5
S1399 S2947 15 11 4
S3181 S905 18 14 4
S3080 S1006 19 15 4
S3325 S1905 34 30 4
S3121 S604 24 21 3
S649 S3068 20 17 3
S282 S2145 14 11 3
S3275 S1106 45 42 3
S803 S2860 19 17 2
S1086 S2135 14 12 2
S2558 S2943 13 11 2
S296 S1933 17 14 2
S601 S118 11 9 2
S1795 S273 12 10 2
S407 S2848 19 17 2
S40 S2514 17 15 2
S3098 S1665 17 15 2
S2283 S189 16 14 2
S2049 S3229 29 27 2
S2007 S1249 15 14 1
S555 S1666 9 8 1
S1664 S2448 11 10 1
S491 S2884 14 13 1
S558 S215 9 8 1
S1733 S2074 12 11 1
S665 S2583 19 18 1
S488 S310 10 9 1
S782 S1831 11 10 1
S1247 S2904 8 6 1
S40 S1612 12 11 1
S607 S2938 10 9 1
S922 S1434 5 5 1
S2841 S2417 12 10 1
S1590 S2731 7 6 1
S435 S1984 19 18 1
S3133 S53 6 5 1
S1676 S2087 6 5 1
S2887 S2329 2 1 0
S855 S2197 14 14 0
S255 S37 2 2 0
S885 S229 2 2 0

Figures:

Show code
# Bar Plot of Differences via plotly
gg1 <- ggplot(lengths_tbl_print, aes(x = ID, y = difference, text = paste("Origin: ", origin, "<br>Destination: ", destination))) +
  geom_bar(stat = "identity") +
  scale_fill_viridis_b()+
  labs(
    title = "Difference Between Route Length and Desire Line Length",
    x = "Route ID",
    y = "Difference (km)"
  ) +
  theme_dark()

gg1_plotly <- ggplotly(gg1)

gg1_plotly

Results:

The mean route length for the optimized routes for this particular sample run is 17.16 km, while the mean Euclidean desire line length is 14.32 km. This represents a percent change of 19.81% longer for the optimized routes compared to the direct desire lines. The interactive map above visualizes these routes, with desire lines colored based on their lengths and Open Street Routing Machine (OSRM) routes similarly colored with a different theme.

Discussion:

The results indicate that the optimized OSRM routes are significantly longer than the direct desire lines, which is generally expected given the constraints of the built environment and road network. The percent change of 19.81% suggests that while the desire lines represent the most direct path between two points, real-world travel must navigate around obstacles, follow roadways, and adhere to traffic regulations etc.

There were many obstacles when creating this project. Firstly, the stplanr has a number of vestigial functions that are served by “sf” package, such as “st_as_sf”, which made it difficult to find the right functions to use, often the code would break and I would have to go searching for the reason.

The OSRM routing service can also be unreliable at times, particularly for points that are very close together or in areas with limited road connectivity. This led to some origin-destination pairs not being counted as a “route”, which had to be filtered out of the final analysis. At first, these routes did not intersect with the correct Bus Shelters (ie: the routes were truly random, instead of 50 pairs, there were essentially 100).

As a result, the purrr package was used to map over the origin-destination pairs to create overlapping routes, which was a complex package to comprehend at first. Only the map2 function, which essentially applies a function to two lists in parallel, was used in this project.

Other functions such as “scales” (for “comma” & “round” functions tidying up data for presentation in a table), didn’t work properly at first, as they change the numeric to a character. Plotly and kablextra were straightforward enough.

Limitations:

  • This does not represent all bus stops in NYC, just shelters. Although the exact number of bus stops is difficult to find, the MTA states that there are 327 bus routes in the five boroughs and countless stops in between. To make the data manageable both in computation and visualization, this study only selects 50 at random. This limits the amount of data points and does not fully capture the bus network.
  • The OSRM routing service may not always find a route between two points, especially if they are very close together or in areas with limited road connectivity. The code removes these unroutable routes, and they are not shown in the data.
  • The analysis does not account for real-world factors such as traffic, hazards, closures, road conditions, bus “bunching” or transit schedules, which can significantly impact actual travel times and route efficiency.The analysis assumes that the shortest path is the most efficient, which may not always be the case in real-world scenarios.
  • The sample size of 50 origin-destination pairs is relatively small and is not be representative of the entire bus network in NYC.
  • Only “Primary” and “Secondary” roads are sampled here, as the computation for the smaller roads (tertiary etc.) was too processing heavy. This eliminates a large selection of routes.

Future:

  • Methods: Future research could expand the sample size to include more origin-destination pairs, or even all bus stops. Incorporating real-world travel time data, traffic patterns, and transit schedules could provide a more comprehensive understanding of route efficiency. Further analysis could also explore the impact of different modes of transportation, such as cycling or walking, on route optimization and efficiency.

  • Representation: A number of visual elements could be changed, such as mapping the two routes together, so that when you select one, the other one is highlighted. Different color schemes could be used for more differentiation of routes and desire lines. A buffer could be added to the polylines to make it easier for the user to select the polylines to see the popup information. Right now, you need to click directly on it which is difficult.

References:

  • A. J. Smit. Mapping with Google in R. Source.
  • CRAN. stplanr Paper and Vignettes. Source.
  • NYC Open Data. Bus Stop Shelters Dataset. Source.
  • Robin Lovelace. stplanr: R Package Documentation. Source.
  • openstreetmap.de. OpenStreetMap Routing Service. Source.
  • RStudio. Leaflet for R. Source.
  • MTA. New York City Transit Subway and Bus Facts 2019. Source.
  • NYC.gov. NYC Borough Boundaries Dataset. Source.